A markup language is a Encoding which specifies the structure and formatting of a document and potentially the relationships among its parts. Markup can control the display of a document or enrich its content to facilitate automated processing.
A markup language is a set of rules governing what markup information may be included in a document and how it is combined with the content of the document in a way to facilitate use by humans and computer programs. The idea and terminology evolved from the marking up of paper (e.g., with revision instructions by editors), traditionally written with a red pen or blue pencil on authors' manuscripts.
Older markup languages, which typically focus on typesetting and presentation, include troff, TeX, and LaTeX. Scribe and most modern markup languages, such as XML, identify document components (for example headings, paragraphs, and tables), with the expectation that technology, such as stylesheets, will be used to apply formatting or other processing.
Some markup languages, such as the widely used HTML, have pre-defined presentation semantics, meaning that their specifications prescribe some aspects of how to present the structured data on particular media. HTML, like DocBook, Open eBook, JATS, and many others, are based on the markup XML and SGML. That is, SGML and XML allow designers to specify particular XML schema, which determine which elements, attributes, and other features are permitted, and where.
A key characteristic of most markup languages is that they allow combining markup with content such as text and pictures. For example, if a few words in a sentence need to be emphasized, or identified as a proper name, defined term, or another special item, the markup may be inserted between the characters of the sentence.
For centuries, this task was done primarily by skilled typographers known as markup menAllan Woods, Modern Newspaper Production (New York: Harper & Row, 1963), 85; Stewart Harral, Profitable Public Relations for Newspapers (Ann Arbor: J. W. Edwards, 1957), 76; and Chiarella v. United States, . or markers From the Notebooks of H. J. H & D. H. An on Composition, Kingsport Press Inc., undated (1960s). who marked up text to indicate what typeface, style, and size should be applied to each part, and then passed the manuscript to others for typesetting by hand or machine.
The markup was also commonly applied by , , , and , and by authors themselves, all of whom might also mark things such as corrections and changes.
There is considerable overlap and concurrent use of markup types. In modern word-processing systems, presentational markup is often saved in descriptive-markup-oriented systems such as XML, and then processed procedurally by . The programming in procedural-markup systems, such as TeX, may be used to create higher-level markup systems that are more descriptive in nature, such as LaTeX.
In recent years, several markup languages have been developed with ease of use as a key goal, and without input from standards organizations, aimed at allowing authors to create formatted text via , for example in and . These are sometimes called lightweight markup languages. Markdown, BBCode, and the Wikitext are examples of such languages.
Brian Reid, in his 1980 dissertation at Carnegie Mellon University, developed a theory and working implementation of descriptive markup in actual use. However, IBM researcher Charles Goldfarb is more commonly considered the inventor of markup languages. Goldfarb developed the basic idea while working on a primitive document management system intended for law firms in 1969, and helped invent IBM's Generalized Markup Language (GML) later that same year. GML was first publicly disclosed in 1973.
In 1975, Goldfarb moved from Cambridge, Massachusetts to Silicon Valley and became a product planner at the IBM Almaden Research Center. There, he convinced IBM's executives to deploy GML commercially in 1978 as part of IBM's Document Composition Facility product, and it was widely used in business within a few years.
Standard Generalized Markup Language (SGML), the first standard descriptive markup language, was based on both GML and GenCode. It was the result of an International Organization for Standardization (ISO) committee that was first chaired by Tunnicliffe, and which Goldfarb also worked on beginning in 1974. Goldfarb eventually became chair of the committee. SGML was first released by ISO as the ISO 8879 standard in October 1986.
In the early 1980s, the idea that markup should focus on the structural aspects of a document and leave the visual presentation of that structure to the interpreter led to the creation of SGML. The language was developed by a committee chaired by Goldfarb. It incorporated ideas from many different sources, including Tunnicliffe's project, GenCode. Sharon Adler, Anders Berglund, and James A. Marke were also key members of the SGML committee.
SGML specifies a syntax for including the markup in documents, as well as one for separately describing what tags are allowed, and where (the document type definition (DTD), later known as a XML schema). This allows authors to create and use any markup they want, selecting tags that make the most sense to them and are named in their own , while also allowing automated verification. Thus, SGML is properly a metalanguage, and many markup languages are derived from it. From the late 1980s onward, most substantial new markup languages have been based on SGML, including the Text Encoding Initiative (TEI) guidelines and DocBook. SGML was promulgated as the ISO 8879 standard in 1986.
SGML found wide acceptance and use in fields with very large-scale documentation requirements. However, many found it cumbersome and difficult to learn—a side effect of its design attempting to do too much and being too flexible. For example, SGML made end tags (or start tags, or both) optional in certain contexts, because its developers thought markup would be done manually by overworked support staff who would appreciate saving keystrokes.
Berners-Lee considered HTML an SGML application. The Internet Engineering Task Force (IETF) formally defined it as such with the mid-1993 publication of the first proposal for an HTML specification: "Hypertext Markup Language (HTML)" by Berners-Lee and Dan Connolly, which included an SGML DTD to define the grammar. Many of the HTML text elements are found in the 1988 ISO technical report TR 9537 Techniques for using SGML, which in turn covers the features of early text formatting languages, such as that used by the RUNOFF command developed in the early 1960s for the Compatible Time-Sharing System operating system. These formatting commands were derived from those used by typesetters to manually format documents. Steven DeRose argues that HTML's use of descriptive markup (and the influence of SGML in particular) was a major factor in the success of the Web, because of the flexibility and extensibility that it enabled.DeRose, Steven J. "The SGML FAQ Book". Boston: Kluwer Academic Publishers, 1997. HTML became the main markup language for creating web pages and other information that can be displayed in a web browser and is likely the most used markup language in the world in the 21st century.
XML adoption was hastened by the fact that every XML document can be written so that it is also an SGML document, allowing existing SGML users and software to switch to XML fairly easily. At the same time, XML eliminates many complex features of SGML to simplify implementation environments such as documents and publications. It appears to balance simplicity and flexibility, as well as support very robust schema definitions and validation tools, and was rapidly adopted for many uses. XML is now widely used for communicating data between applications, serializing program data, for hardware communication protocols, vector graphics, and other uses besides documents.
One of the most noticeable differences between HTML and XHTML is the latter's rule that all tags must be closed: empty HTML tags such as <nowiki></nowiki> must either be closed with a regular end-tag, or replaced by a special form: <nowiki></nowiki> (the space before the slash on the end tag is optional but frequently used, because it enables some pre-XML web browsers and SGML parsers to accept the tag). Another difference is that all HTML attribute values in tags must be quoted. Both these differences are commonly criticized as verbose but also praised because they make it far easier to detect, localize, and repair errors. Finally, all tag and attribute names within the XHTML namespace must be lowercase to be valid. HTML, on the other hand, was case-insensitive.
At Mozilla, we’re a global community of
working together to keep the Internet alive and accessible, so people worldwide can be informed contributors and creators of the Web. We believe this act of human collaboration across an open platform is essential to individual growth and our collective future.
Read the Mozilla Manifesto to learn even more about the values and principles that guide the pursuit of our mission.
Mozilla is cool
The codes enclosed in angle-brackets <like this> are markup instructions (known as tags), while the text between these instructions is the actual text of the document. The codes h1, p, and em are examples of semantic markup, in that they describe the intended purpose or the meaning of the text they include. Specifically, h1 means the enclosed text is a first-level heading, p means a paragraph, and em means an emphasized word or phrase. A program interpreting such structural markup may apply its own rules or styles for presenting the various pieces of text, using different typefaces, boldness, font size, indentation, color, or other styles, as desired. For example, a tag such as h1 might be presented in a large bold sans-serif typeface in an article, or it might be underscored in a monospaced (fixed-width font) document, or it might not change the presentation at all.
In contrast, the i tag in HTML 4 is an example of presentational markup, which is generally used to specify a characteristic of the text without specifying the reason for that appearance. In this case, the i element dictates the use of an . However, in HTML 5, this element has been repurposed with a more semantic usage: to denote "a span of text in an alternate voice or mood, or otherwise offset from the normal prose in a manner indicating a different quality of text". For example, it is appropriate to use the i element to indicate a taxonomic designation or a phrase in another language. The change was made to ease the transition from HTML 4 to 5 as smoothly as possible so that deprecated uses of presentational elements would preserve the most likely intended meaning.
TEI has published extensive guidelines for how to encode texts of interest in the humanities and , developed through years of international cooperative work. These guidelines are used for encoding historical documents, and the works of particular scholars, periods, and genres.
The use of XML has also led to the possibility of combining multiple markup languages into a single profile, like XHTML+SMIL and XHTML+MathML+SVG. An XHTML + MathML + SVG Profile . W3C. August 9, 2002. Retrieved 2021-08-16.
|
|